Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN][TFLite] Parsing TFLite quantized models. #3900

Merged
merged 1 commit into from
Oct 14, 2019

Conversation

anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Sep 5, 2019

As of now, QNN dialect has enough QNN ops for Inception model. This PR takes the TFLite Pre-quantized model and parses it for a handful of QNN operators.

@jackwish @FrozenGene @apivovarov @zhiics @shoubhik

@anijain2305 anijain2305 force-pushed the tflite_parser branch 3 times, most recently from 0f75650 to a0e7be2 Compare September 5, 2019 21:31
@@ -1030,4 +1153,6 @@ def from_tflite(model, shape_dict, dtype_dict):
outputs = [exp_tab.get_expr(get_tensor_name(subgraph, i)) for i in model_outputs]
outputs = outputs[0] if len(outputs) == 1 else _expr.Tuple(outputs)
func = _expr.Function(analysis.free_vars(outputs), outputs)
return _module.Module.from_expr(func), params
mod = _module.Module.from_expr(func)
mod = tvm.relay.qnn.transform.CanonicalizeOps()(mod)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhiics @vinx13 @tqchen Should we remove this from here and make it a part of build_module.cc?

@anijain2305 anijain2305 marked this pull request as ready for review September 5, 2019 21:39
@@ -848,6 +848,25 @@ def test_forward_inception_v4_net():
tvm.testing.assert_allclose(np.squeeze(tvm_output[0]), np.squeeze(tflite_output[0]),
rtol=1e-5, atol=1e-5)

def test_forward_quantized_inception_v1_net():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename it to test_forward_qnn_inception_v1_net

python/tvm/relay/frontend/tflite.py Outdated Show resolved Hide resolved
@anijain2305 anijain2305 force-pushed the tflite_parser branch 4 times, most recently from 1e47074 to ab3b78d Compare September 28, 2019 01:22
@mshawcroft
Copy link
Contributor

Looks like the latest version of this PR is dependent on #4033 for the renaming of qnn.quantized_dense to qnn.dense.

@anijain2305
Copy link
Contributor Author

@jackwish @FrozenGene @apivovarov @zhiics @shoubhik

Ping to review please. This has been parked for sometime. I think this will benefit users who want to read pre-quantized models.

@mshawcroft
Copy link
Contributor

This patch is a big step forward, the latest version works for a number of networks so would be good to get it merged sooner rather than later.

There are a few common failure cases that we see, not sure if these are expected or unexpected at the moment:

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.requantize

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.dequantize

@FrozenGene
Copy link
Member

@anijain2305 I am enjoying National Day holiday, plan to come back Oct 11. Would you mind waiting for my review after that? Thanks.

@FrozenGene
Copy link
Member

@mshawcroft your issue should not be related with this pr. But it should be resolved. I think register for injective is enough.

@anijain2305
Copy link
Contributor Author

anijain2305 commented Oct 1, 2019

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.requantize

TVMError: Check failed: idx < data_.size() && data_[idx].second != 0: Attribute TOpPattern has not been registered for Operator qnn.dequantize

Thanks for checking it for many networks. You are seeing this error because you have to call

mod = tvm.relay.qnn.transform.CanonicalizeOps()(mod)

See the changes made in this PR in the file - tests/python/frontend/tflite/test_forward.py

Pre-quantized models go through a Relay dialect called QNN. There are a couple of QNN passes that convert all QNN ops to a sequence of already existing Relay ops. Currently, these QNN passes are not a part of relay.build module and have to be called explicitly. We are working on simplifying this in another PR (#3971). Both of the PRs have been sitting idle for some time. Sorry for the inconvenience. We will try to get them in soon.

@anijain2305
Copy link
Contributor Author

Given #3971 got merged, this PR is the only missing piece for reading pre-quantized TFLite models.

Reviews will be greatly appreciated - @jackwish @apivovarov @zhiics @shoubhik @tqchen @yzhliu @u99127

Copy link
Contributor

@u99127 u99127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

assert len(output_tensors) == 1, "output tensors should be 1"
output_tensor = output_tensors[0]
assert self.has_same_qnn_params(input_tensor, output_tensor), \
"qnn.op.reshape requires input and output qnn params to be same"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a requirement of tflite quantized outputs, then the error message should be :

"tflite.reshape requires input and output scale and zero_points to be the same"

else:

should this not be NotImplementedError(...) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will update the message.

out = _op.nn.avg_pool2d(in_expr, **params)
else:
assert self.has_same_qnn_params(input_tensor, output_tensor), \
"qnn.op.avg_pool2d requires input and output qnn params to be same"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as the other one ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will change the message.

@@ -153,13 +155,24 @@ def get_tensors(self, tensors_idx_list):
return_list = list()
for tensor_idx in tensors_idx_list:
if tensor_idx < 0:
return_list.append(TensorWrapper(tensor_idx, 0, 0))
return_list.append(TensorWrapper(tensor_idx, 0, 0, None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value is None. we don't need to change this line of code.

Comment on lines 173 to 177
if qnn_params['scale'] == 0 and qnn_params['zero_point'] == 0:
qnn_params = None
return_list.append(TensorWrapper(tensor_idx, tensor, buffer, qnn_params))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better way is to check if qnn_params['scale'] != 0 or qnn_params['zero_point'] != 0:, because the default value of qnn_params is None in line 166.

Comment on lines 215 to 217
def has_same_qnn_params(self, tensor_a, tensor_b):
return tensor_a.qnn_params['scale'] == tensor_b.qnn_params['scale'] and \
tensor_a.qnn_params['zero_point'] == tensor_b.qnn_params['zero_point']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor code style suggestion. tensor_a be lhs_tensor and tensor_b be rhs_tensor like we did in convert_elemwise.


# If the tensors are quantized, ensure that input/output qnn params are same.
if input_tensor.qnn_params is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified if input_tensor.qnn_params:

# If the tensors are quantized, ensure that input/output qnn params are same.
if input_tensor.qnn_params is not None:
output_tensors = self.get_output_tensors(op)
assert len(output_tensors) == 1, "output tensors should be 1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"output tensors should be 1" -> "The length of output tensors should be 1"


# Softmax does not go well with Int8. So Dequantize back to FP32.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better leave one TODO. Softmax's int8 computation is complex but we could leave it to implement when we have spare time. The same situation suits for like sigmoid / mean operators.

@@ -738,7 +819,15 @@ def convert_conv(self, op, conv_type):
raise tvm.error.OpAttributeUnImplemented(
'Padding format {} is not supported for operator Conv.'.format(padding))

out = _op.nn.conv2d(data=in_expr, weight=weight_expr, **params)
# Check if the inputs are quantized. If yes, call qnn conv2d.
if input_tensor.qnn_params is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment before.

.format('qnn.op.dense'))

# Finally if the dense is quantized. Add a requantize at the end.
if output_tensor.qnn_params is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment before.

@@ -581,7 +640,23 @@ def convert_fully_connected(self, op):

# If we have fused activations
if fused_activation_fn != ActivationFunctionType.NONE:
out = self.convert_fused_activation_function(out, fused_activation_fn)
if output_tensor.qnn_params is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment before.

@@ -1029,3 +1048,6 @@ def test_forward_ssd_mobilenet_v1():
test_forward_inception_v3_net()
test_forward_inception_v4_net()
test_forward_ssd_mobilenet_v1()

# End to End quantized
test_forward_qnn_inception_v1_net()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us add Mobilenet V2 quant model if this PR could run it. Because it is very popular and often compared across different framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Till now I have been focussing on the server side, and thus I used Inception. Can this PR go in first and I can send a PR for mobilenet V2 separately?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem. However, I have one question that do we have any TODO for Mobilenet V2 quant model? From my knowledge, we should work well on Mobilenet V2 quant model using this PR and just add test code of Mobilenet V2 quant model.

Copy link
Contributor Author

@anijain2305 anijain2305 Oct 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is. I want to do a full 50k image test before adding MobileNetV2. I have done that for InceptionV1 and ensured that its accuracy is same as that provided in TFLite info. I will setup the pre-processing for mobilenet next week, and check end-to-end accuracy, and send a separate PR. What do you say?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to do it you mention. Because we are parsing TFLite quant model and TFLite framework is one baseline. We could treat TFLite quant model like TFLite FP32 model and compare the result between us and TFLite framework. They are no difference. I think we only need to do end-to-end accuracy when we do quantization in TVM inside or we have strong reason to implement ops but don't keep compatibility with TFLite framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added MobileNet V1. MobilenetV2 fails with a segfault in LLVM. Will debug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the detail information of LLVM report? Try the latest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the whole stack trace. Happens while the graph runtime create. Relay build has already passed.

0x00007fff78dd1e64 in llvm::EVT::getExtendedVectorNumElements() const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
(gdb) bt
#0  0x00007fff78dd1e64 in llvm::EVT::getExtendedVectorNumElements() const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#1  0x00007fff78ac1090 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#2  0x00007fff78ac0318 in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#3  0x00007fff78abe55f in llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::KnownBits&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const ()
   from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#4  0x00007fff789c7b65 in (anonymous namespace)::DAGCombiner::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#5  0x00007fff789a4339 in (anonymous namespace)::DAGCombiner::visit(llvm::SDNode*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#6  0x00007fff7897670c in (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#7  0x00007fff78975cd3 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#8  0x00007fff78aaa7b2 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#9  0x00007fff78aa9939 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#10 0x00007fff78aa6c06 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#11 0x00007fff787c2ffe in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#12 0x00007fff78cb0ff4 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#13 0x00007fff7976c4ba in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#14 0x00007fff7976c843 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#15 0x00007fff7976cdbf in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#16 0x00007fff78bdee38 in llvm::MCJIT::emitObject(llvm::Module*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#17 0x00007fff78bdf094 in llvm::MCJIT::generateCodeForModule(llvm::Module*) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#18 0x00007fff78be038e in llvm::MCJIT::findSymbol(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#19 0x00007fff78bdfe58 in llvm::MCJIT::getSymbolAddress(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
   from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#20 0x00007fff78be04ba in llvm::MCJIT::getGlobalValueAddress(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#21 0x00007fff77dcb794 in tvm::codegen::LLVMModuleNode::LazyInitJIT() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#22 0x00007fff77dca02e in tvm::codegen::LLVMModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<tvm::runtime::ModuleNode> const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#23 0x00007fff77e0db68 in tvm::runtime::GraphRuntime::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor, std::allocator<DLTensor> > const&, unsigned long) ()
   from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#24 0x00007fff77e0afe4 in tvm::runtime::GraphRuntime::SetupOpExecs() () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#25 0x00007fff77e09aad in tvm::runtime::GraphRuntime::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLContext, std::allocator<DLContext> > const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#26 0x00007fff77e0f10b in tvm::runtime::GraphRuntimeCreate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module const&, std::vector<DLContext, std::allocator<DLContext> > const&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#27 0x00007fff77e10f8b in std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::$_12>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#28 0x00007fff77de5577 in TVMFuncCall () from /home/ubuntu/workplace/tvm/t1/tvm/build/libtvm.so
#29 0x00007fff7e54ee20 in ffi_call_unix64 () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#30 0x00007fff7e54e88b in ffi_call () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#31 0x00007fff7e54901a in _ctypes_callproc () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#32 0x00007fff7e53cfcb in ?? () from /usr/lib/python3.5/lib-dynload/_ctypes.cpython-35m-x86_64-linux-gnu.so
#33 0x00000000005c3bd7 in PyObject_Call ()
#34 0x00000000005354af in PyEval_EvalFrameEx ()
#35 0x000000000053a81b in PyEval_EvalCodeEx ()
#36 0x00000000004e3423 in ?? ()
#37 0x00000000005c3bd7 in PyObject_Call ()
#38 0x00000000004f08be in ?? ()
#39 0x00000000005c3bd7 in PyObject_Call ()

@anijain2305
Copy link
Contributor Author

anijain2305 commented Oct 12, 2019

@FrozenGene Thanks! Can you please review again? I incorporated your comments. I would suggest to have a separate PR for mobilenet. I can take care of that in next couple of weeks.

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. For MobilenetV2, I think we can send a separate PR because it looks to me that it wouldn't change structure of this PR (much). @FrozenGene how do you think?


def test_forward_qnn_mobilenet_v1_net():
"""Test the Quantized TFLite Mobilenet V1 model."""
# MobilenetV2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/MobilenetV2/MobilenetV1

Add a todo for MobilenetV2?


def test_forward_qnn_mobilenet_v1_net():
"""Test the Quantized TFLite Mobilenet V1 model."""
# MobilenetV2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MobilebetV1

tvm_output = run_tvm_graph(tflite_model_buf, data, 'input')
tvm_predictions = np.squeeze(tvm_output)
tvm_sorted_labels = tvm_predictions.argsort()[-3:][::-1]
tvm.testing.assert_allclose(tvm_sorted_labels, tflite_sorted_labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain more why we can not compare tflite_out with tvm_out directly like we do in FP32? I think we could get the same result if we have kept the compatibility with TFLite, if not, why we don’t keep the compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference between TFLite and QNN is rounding. TFLite reference implementation is reverse-engineered from ARM assembly instructions (which perform 2 roundings in general). QNN follows more formal rounding described here. If you are interested, I would encourage to read the inlined comments in the code - https://github.com/dmlc/tvm/blob/068c148f01fcc81c72004512de320ca1cee24dc6/src/relay/qnn/util.cc#L79-L133

There was also a fair amount of discussion which has been summarized here - #3591 (comment). This should also be helpful.

This rounding can make small variations in the output, because of which we can not do exact check.

I have measured accuracy on TFLite quantized Inception V1-V3 on 50K images using QNN, and observed accuracy parity with TFLite accuracy native execution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information! @anijain2305 However, if I remember correctly, when I review the code of requantize, I mention that we could use round_away_from_zero, which is used for TFLite. Do we change it later?

Alright, if we change the default value later, In my opinion, I think we should pass round_away_from_zero to requantize operator for TFLite model frontend parsing.The reasons are:

  • Keep the compatibility with TFLite should be the first priority. Because we are parsing TFLite model, the quantization accuracy is controlled by Tensorflow quantization-aware training and TFLite.
  • When we run new quantization model of TFLite, we can not get test suites all the time. For example, the customers / algo. team sends us the TFLite model and requires us boost the performance. We should make sure the result is the same as TFLite. The simplest way is to compare the result with TFLite, not do end-to-end accuracy, sometimes we even can not do it.
  • For our development, we could also do the simple compare with TFLite like FP32, not do end-to-end accuracy. If one op need to requantize internally, we could compare it with TFLite, not need to think about the result is right or not using math when we have different result.

Copy link
Contributor Author

@anijain2305 anijain2305 Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFlite rounding is not exactly formal round_away_from_zero. You might already know but these are the two functions that are involved in TFLite rounding

  using gemmlowp::RoundingDivideByPOT;
  using gemmlowp::SaturatingRoundingDoublingHighMul;

More at - https://github.com/tensorflow/tensorflow/blob/ff91cd691027076e6128afbd902d3d38e3672787/tensorflow/lite/kernels/internal/common.h#L105

The key idea is that the above two functions map to ARM assembly instructions. But, in this process, they perform approximation that make it little different from formal GNU C rounding. The reasoning behind doing that should be to get performance.

In QNN, we decided to follow GNU C rounding, as thats more formal and can work across more frameworks (like MxNet, PyTorch and not just TFLite).

However, if one wants, one can implement a TFLite rounding. The code is modular enough to add one more rounding option. I think you have good comfortability with TFLite, so I would encourage you to write TFLite rounding if you get time :)

Copy link
Member

@FrozenGene FrozenGene Oct 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully understand your point across more frameworks. My previous point is to emphasize we could provide one option for our TFLite frontend parser so that we could keep the compatibility with TFLite, we don't need to change any default value in our existing code, which could be used for MXNet, PyTorch and so on. Otherwise, the users will confuse why our result is not the same as TFLite when they use TVM. Keep the compatibility with TFLite should be the first priority when we parse TFLite model. So I think we shouldn't leave this to implement by TVM users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, as you said, it doesn't affect the code structure. If you agree the point of compatibility point, I think we could make this PR in firstly and open one issue reminding us we should do this thing. If you don't have spare time, I could help to manage it next.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can open an issue to track the TFLite rounding. It would not affect code structure. We can set the value of the rounding in TFLite parser, hiding it from the TVM users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Could you help to open this this to track it? And you could assign this task to me.

@anijain2305
Copy link
Contributor Author

@u99127 I have incorporated your comments. Please review again. Also please approve explicitly, it helps committers :)

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM generally, some code style comments and model specific concerns only. Thank you for the ping, and sorry for the delayed feedback. I was on holiday and busy with some other projects.

One thing I'd like to comment is that, as the QNN operators implementation is different from TFLite (I mean not trying to emulate float point with integer in operators like quantize/dequantize/requantize), making QNN dialect output elementwisely equals to TFLite in bits seems not a must (TFLite becomes a reference but not a mirror) - not the target of these QNN operators. The evaluation of the design should be the model accuracy I think.

input_scale=input_tensor.qnn_params['scale'],
input_zero_point=input_tensor.qnn_params['zero_point'])

out = _op.nn.softmax(in_expr, **params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems generates float output for softmax, would it be better if we quantize it back to uint8 since the semantic of a uint8 model requires an uint8 output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would indeed be better. Doing that.

Comment on lines +444 to +461
raise tvm.error.OpNotImplemented(
'Operator {} with fused activation is not supported yet.'
.format('qnn.op.concatenate'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we can have an clamp here to support it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to handle different types of activations like Relu/Relu6. I have a draft implementation of activation handling. I will send a separate PR for that once I test it with few cases.

@@ -584,7 +639,14 @@ def convert_fully_connected(self, op):
weight_value = self.get_tensor_value(weight_tensor)
weight_expr = self.exp_tab.new_const(weight_value, dtype=weight_tensor_type_str)

out = _op.nn.dense(in_expr, weight_expr)
# Check if the inputs are quantized. If yes, call qnn dense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty easy-to-read logic making the comments not needed to me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

Comment on lines 643 to 661
if not input_tensor.qnn_params:
out = _op.nn.dense(in_expr, weight_expr)
else:
out = _qnn.op.dense(in_expr, weight_expr,
input_zero_point=input_tensor.qnn_params['zero_point'],
kernel_zero_point=weight_tensor.qnn_params['zero_point'],
out_dtype='int32')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally, i suggest to write such logic as below because the human are more easy to understand not logic to me...

if input_tensor.qnn_params:
    # path 1
else:
    # path 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Readability is very important. Will change the style.

Comment on lines +872 to +887
raise tvm.error.OpNotImplemented(
'Operator {} with fused activation is not supported yet.'
.format('qnn.op.conv2d'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, AFAIK, TFLite model converter always fuses the ReLU family activations to Conv/FC operators. The activation of Inception model you are using is not fused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, TFLite hosted models do not have any fused activation. There is a fused activation in Object Detection models, but we are far away from supporting object detection (basically dependent on control flow support).

Comment on lines +877 to +898
if output_tensor.qnn_params:
input_scale = input_tensor.qnn_params['scale'] * weight_tensor.qnn_params['scale']
input_zero_point = 0
out = _qnn.op.requantize(out,
input_scale=input_scale,
input_zero_point=input_zero_point,
output_scale=output_tensor.qnn_params['scale'],
output_zero_point=output_tensor.qnn_params['zero_point'],
out_dtype=output_tensor_type_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have a small function, requantize_output(out, output_tensor, input_tensor) to wrapper this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. But, I will skip this one for now. Currently, there are only 2 uses of requantize. Both of which are quite specific - where we have to multiply the input scales to get the new input scale. As we add more operators, and thus more requantize, I will revisit this.

assert self.has_same_qnn_params(input_tensor, output_tensor), \
"TFLite avg_pool2dreshape requires input and output scale" \
"and zero points to be equal"
out = _op.cast(in_expr, dtype="int32")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style consistency of the "/' :)

@zhiics
Copy link
Member

zhiics commented Oct 14, 2019

Thanks @anijain2305 @jackwish @FrozenGene @mshawcroft @u99127 for the hard work on this PR. Since we now have approvals from several organizations and the parser is in a good shape, I am going to merge it.

@anijain2305 please be aware of some further work on it, like MobileNetV2 and activation handling.

@zhiics zhiics merged commit bfb811c into apache:master Oct 14, 2019
@anijain2305
Copy link
Contributor Author

Thanks @zhiics

Yes, I will follow up on both MobilenetV2 and activation handing. Will open a discuss forum to address @FrozenGene concerns regarding TFLite output exact match as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants